R E r 0 



R T 



RESUMES 



ED 015 

A rERrORMANCE TEST OF 
BY- POFHAMi V. JAMES 



TEACHING EFFECTIVENESS. 
BAKER- EVA L. 



SF 001 255 



EDRS PRICE MF-$D.25 HC-$0.36 



PUB DATE 19 FEB 66 



Descriptors- academic performance! «effectivE teachinGj 
INDUSTRIAL EDUCATION, >aFREDICTIVE MEASUREMENT, RELIABILITY , 
SECONDARY EDUCATION, STANDARDIZED TESTvS, STATISTICAL 
ANALYSIS, STUDENT ATTITUDES, STUDENT INTERESTS, S'lUDENT 
MOTIVATION, TABLES (DATA), ^TEACHER EVALUATION, «TEST 
CONSTRUCTION, :^TEST VALIDITY, KUDER RICHARDSON FORMULA 20, 
WONDERLIC PERSONNEL TEST, 



THIS REPORT DESCRIBES THE INITIAL VALIDATION OF 
PERFORMANCE TESTS OF TF.ACHER EFFECTIVENESS— USING PUPIL GAINS 
AS THE CRITERION OF EFFECTI VENESS— AND THE STEPS TAKEN IN 
RECOGNITION OF THE PROPRIETY OF SUCH MEASURES ONLY IF ALL 
TEACHERS ARE TEACHING FOR THE SAME OBJECTIVES. AS A FIRST 
STEP, IT WAS HYPOTHESIZED THAT A VALID PERFORF- NCE TEST OF 
TEACHER EFFECTIVENESS SHOULD DISCRIMINATE BETWEEN TWO EXTREME 
GROUPS- (1) NONTEACHERS AND (2) SUPERIOR EXPERIENCED 
TEACHERS-BEFORE IT COULD BE USED FOR ASSESSING TEACHERS WHO 
DIFFER IN SPECIFIED WAYS-(E.G. , THOSE WHO ARE AND ARE NOT 
INTENSIVELY TRAINED TO BRING ABOUT BEHAVIOR CHANGE IN 
STUDENTS) . SCORES ON STUDENT ACHIEVEMENT MEASURES ON TV/0 
INDUSTRIAL EDUCATION TOPICS WERE ASSESSED FOR RELIABILITY AND 
INTERCORRELATED WITH MEASURES OF GRADE POINT AVERAGE, 

INTEREST IN THE SUBJECT MATTER, AND WITH V/ONDERLIC PERSONNEL 
TEST SCORES. THE OBJECTIVE HERE WAS TO DETECT VARIABLES THAT 
COULD POTENTIALLY BE USED TO CONTROL FOR STUDENT DIFFERENCES 
IN SUCH FACTORS AS "SET," INTELLIGENCE, ETC., IN ASSESSING 
TEACHER EFFECTIVENESS. KUDEF-RICHARDSON RELIABILITY 
COEFFICIENTS OF .44 AND .78 WERE FOUND FOR THE ACHIEVEMENT 
TESTS. TEST SCORES CORRELATED .68 WITH GRADE POINT AVERAGE. A 
•PERPLEXING" FINDING WAS HIGHER TEST SCORES AMONG THOSE 
EXPRESSING LESS INTEREST IN THE INSTRUCTIONAL TOPIC. PRETEST 
SCORES WERE MORE HIGHLY CORRELATED WITH FOSTTEST SCORES THAN 
WERE WONDERLIC SCORES. (PAPER PRESENTED AT THE 1966 AMER. 
EDUC. RES. ASSN. MEETING, CHICAGO, FEBRUARY 17-19, 1966). 

(AF) 



O 







m 4 

S^p©sr PresQiited at the 1966 Aaericass Ediicatiojsal Research Association Meeticg 
. ^ Osicag^s tiiinois, February 1966 

h PIS<FOR!^^ANCE TEST OF TSACKIJ^ BFFSCXlirE?JESS*#.>‘ 






I 

1 



*4 A 



O 

UJ 




We James Fop!s8Sr% and E^a Baker 
UR5.versity or Carii'eraia, Los Angeles 



Problem. s One of the sore andisriug ai2d dl.stx'essia^ probl®as iss educa« 
tion b'3.3 be^ our inabliaty to develop satisfactory laea^srss of teacher 
effectivenesso Although the amount of attesitios which this problem has 
received during the past sissty years is eosisiderablsj fe^? really pro£ 2 is« 
ing advances have occurred* <58RSrallyj three classes of criterion aiea- 
sures have been eanpioyed in previous ^^irical studies: ratings ^ obser- 
vationsj, and pupil gain. Most researchers in the field agree that the 
ttltiaiate criterion of teacher eempetence is pupil growth, and usually 
ratings and observations of the teacher* s bshavior have been used as iss« 
dications of the instructor's probable influence on pupils. 

The prise difficulty in using pupil (^asgS'-avS a measure o<f teaching ” 
proficiency is a ^nset^^ce of the fact that different te^.chers of tea? 
se^' to accos^lish different objectives* Turner and Fattu~ have built a 
conpelling arg^©Bt that since teachers* ot>Jectives vary fr^e situation 
to situations it is iispossiblc to use measures of teaching effectiveness 
which do' not take account of such variability^ and thus inaj^ropriate to 
cosEpare teachers on the basis of their students* growth toi^ard dissisailar 
goals. These researchers^ have attaapted. to ’ resolve this dilemma by using 
as an indeos of teaming s^ilX the teacher ability to solve p^er ai^d 
pencil problems which r^res^Jt selected teai&iner tasks. In their approach, 
the teacher is required to perform several different types of tasks, su<^ 
as detensining the order of aiateriaXs according to th^r difficulty level 
for pupils. 

The rationale for the research described in this paper is similar to 
that ef Turner and Fattu, exc^t that an actual performance test of tead^» 
er*s ability to produce piQJll achievement is used instead of a paper 



research r^orted herein was performed pt^rsuant to a contract with 
the U.S. Department of Health, Education and Welfare. 

#Xhe research rqp»orted herein was supported by the Cooperative Eesearch 
program of the U,S.O,E., U.S. D^artaerst of Health, Education and Welfare. 

^Richard L. Turner and Nicholas A. Fattu, «*Skill in Teaching, a Hes^praisal 
of the Conc^ts and strategies in Teacher Effective/isss Research,** Rulletin 
of the School of Education > Indiana University, Bloomington, Vol. 36. No. 3, 

May, I960. 

1 

^RicSiard L. Turner, ’®Task Perfomance and Teaching Skill in the Intermesliate 
Grades,” The Journ al Of Teacher Education. Vol. 14, No. 3, Septesaber, 1963, 
pp. 299-307. , 

* 

U.S. DEPARTMENT OF HEALTH, EDUCATSON & WELFARE 
OFFICE OF EDUCATION 

THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE i 
PERSON OR OROANiZATION ORIGINATING IT. POINTS OF VIEW OR OPINIONS 
STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EDUCATION 
I POSIDOH OR POLICY. * 




and psnci.l pre<iictiv*e The probl^ of objectives is hop©’- 

fully allayed by « 2 \'^r 9 teachers identical goals to achieve* The assump** 
tlon is that s teacher who is a successful goal achiever with ^iven ob«=* 
Jectives will; other factors remaining eqiialj more probably be successful 
in achieving feis own instnicticrfal objectives. 



Xt is mth this in mind that a series of perfcnaance tests of instruc*-- 
tor comi?et.ersce are being devel<^ed at U,CoLc.A„ under provisions of two 
tl-S-O-H. of t^s=; 



>■ -»ii» "■ «* *jk. "«>■<» V ^ VO-Wk ■ 



cation and one is in the social scisnceso These tests consist of (1) a 
set of operational instructional objectives stated in terms of specific 
papii behaviors, (2) a collection of possible learning activities which 
the teacher may wish to (jsaploy, and (3) prs« and post- tests, Eiot seen or 
administered fey the teacher, which adhere closely to the opera tioF^al ob- 
jectives. The objectives and possible activities are given to the teacher 
one we^f. In advance of instructions, and be is told to pr^are plans for 
three weeJts of teaching. By stipulating identical objectives to fee achieved 
and tested but permitting teacher divergence in accca^lishing these ends, 
a method of evaluating teacher performance without ispisging on individual 
pedagogical style is provided. Teachers are c-^sparing on tfee basis of 
their pupils* aehieveasnt rather than other, 220 re idiosyncratic criteria. 



Rel^'^ed Research . A review of teacher effectiveness research siace 
the turn of the centuzry finds a considerable number of studies, but none 
which might he classified as ^breakthrough . ” Along with the periodic 
revid’v’s of the field, suc^ as those by Morsh and Wildes”^ and, sore roceatly. 
Barr^ and Ryans^, we have witnessed a number of theoretical and ©Epiricai 
reports by individuals such as M. Cogan, N. Flanders, n. Gage, D. Medl(^, 

B. Ssai'&hp R. Wilk, W. Bdson, and J. Withali , to 2*entioa but a few. 

There Is, fortunately, an increasing tnedency on the part of researchers 
to do aw^ay with suds sij^listie notions as the effective teacher and to 
replace these conceptions with a view that the teacher* s instructional 
proficiency is the function of the particular setting in which the instruc- 
tion takes place, including the particular students, ihe social and physi- 
cral eaviroaasnt, and the instructional goals a 



3 

Joseph E« Morsh and Bleanor Wilder, "Id^tifying the Effective lastruc* 
tors A Review of the Quantitative Stud5.es, 1900-1952,” Research Bulletin 
APPTRC-TR-54-44. Lackland Air Force Base, Texas, 1954. 

^A.So Barr, issue eidtor, "Wisconsin Studies of the Measureaent and Pre- 
diction of Teacher Effectiveness? A Summary of Investigations,” J^s-urnal 
of Es^periiaental Education , Vol. 30, Septaaber, 1961, pp. 5-156. 

^avid G. Ryans, "Assessment of Teacher B^a^dor and Instruction,” Review 
of S ducational Research, Vol. 33, Nc« 4, October',*., 1963, pp, 415-441. 

^AIl of these resear^ers contributed articles to the ^,»mpositaa on CIas<=:- 
room Behavior of Teachers, edited by Harry F. Silbersan for the Vol. 14,5 
No. 3, S^tasbar, 1963, issrje of SS 2^ SSS^®£ S&SSMSsi- 






3 



The presesst line of research 5.s an outgrowth of the writers* studies 
of the influence of certain aspects of tea<^er edacatioa programs of the 
elassrcoa inststtctioaal b^avior of student teachers. In these projects^ 
the problem of dissimilar instructional goals presented fomidahle mea« 
sureaent obstacles^, The decision to develop perfozaaance tests of teaching 
Droficiencv was reached nartiv a« yocMi-i* 

Qp3 actives er At the end of two it., is planned to test a ^eeific 

validation hypothesis with each of the three perforzaaace tests. The hypo« 
thesis is that there will be significant differ^sces in pi^il gains achieved 
by (1) non-teachers and (2) espereincsd teachers in a limited iastmtional 
situation provided by the performance tests of teaching profici^cy. Al- 
though ihis hypothesis relates to the validity of the perfoinsance tests^ 
the prior question of their reliability will also be t*reated« Tbs predicted 
outcome of the primary hypothesis test is that the experienced teachers 
will produce better student gains tnan the non—teachers. At present o§ 2 iy 
two of these tests have been developed, to the point where they have been 
field tested. This paper will describe the preliminary results with these 
two instrm^ts. 

It should be pointed out that this two* year project r^ resents only 
an initial effort to validate the perforaanee tests. If the tests with- 
stand this evaluation, more s&isltlve to assess their validity 

Bust be undertaken^ The validation of a teacher effectiv^ess ineasur@sent ’ 
device i^s particularly difficult, of course, because of the lack of de- 
fensible criterion measures against which to evaluate teacher perfomance 
on the device. Thus, as a first step In the evaluation of these new in— 
straaents we will attempt to show that ^ least the tests discriminate 
between two esrtreae groups: (1) sion-teachers and (2) eaiperi^ced teachers 
previously fudged superior by their supervisors with respect to promoting 
pupil adiiev@Bent. 



The writers are well aware of tl.\e notorious deficiencies in adminis- 
trator ratings of teacher coapetence. Yet, for this first assessment of 
the perfomance tests* validity it secas reasonable that a grotqs of teach- 
ers who have been judged excellent by a<&^nistrators and supervisors 
should certainly be able to out-perfom a gro:^ of non-teachers. This 
assus^>tion, of course, is central to the rationale of the study. 

If the perfossaanee tests withstand this initial trial, it can then 
b© seen if th^y discriminate between other groups suid: as { 1 ) teachers 
whose past records indicate great ability to produce pupil achievement 
(as reflected by ardent perfoimance data) and (2) teachers whose records 
indicate the^oppositec The tests ^ould also discriminate between samplers 
of (1) eitperii^ced teachers and (2) ^experienced teachers who have received 
special, intensive training in how to bring about student behavior change 
with reject to spe^edfie^ objectives. These studies, however, depend on 
the perfowBance tests* first passing the validity hurdle repi-eseuted by 
the presently planned investigation. 

of the Tests. At this point initial versions of two tests 
have been develc^ed In the fields of electronics and auto-mechanics. A 



diificult early deaisi<m regarding these tvso tests iavoivsd tae selection 
of topics for the vnit. Ideally, it was that topics should foe {!) 

sufficiently important so that teachers would be willing to include tSiea 
ia their curricula, aad (2) sufficiently unique so that great studssi 
familiarity with the topic would not he coissaono It was also hoped ;that 
topics could be selected which could be inserted rather fi'sely at various 
points during the academic year. 

With these criteria in ssind, the topic selected for the electronics- 
unit was the "General Principles of Electronics Troubleshooting," and the 
topic selected for aato-iaeehanics was **Carburetios5o ” Although deiTciopaent 
of tfes social sci^ce test has Just commenced, the topic tentatively 
selected is ^Research Methods in the Social Sciences." 

Develo^oital work on the two units occurred in the following pattersu 
First, tppics jseeting the above criteria which might be covered in two 
or three weeks were selectedo These were then suhaltted to several subject 
satter specialists who served as consultants during the project* Frca 
these t®itative topics, two were selected and a series of instructional 
objectives were prepared whicda were also screened by consultants. A pre- 
liminary set of these objectives was agreed on and test itsas based dir- 
ectly on the objectives were developed, is addition, possible learning 
activities and reference saateriaXs were assesablsd* In sosie instances 
these learning activities were designed to be particularly pertinent to 
the given objectives. In other eases, the activities were planned to be 
"flashy" but not gemane to the objectives. It was thought that less 
esqperienced insturctors might be attracted to the "flashy," irrelevant 
activities, but that the sophisticated teacher would tmd to use the per- 
tinent activities. These s were revised several times prior to 

initial trial. It is, of corarse, possible that the tcsacher slight choose 
to devel<^ his own activitio^ and not use any of t’'^e ssaterials provided 
in the tuiit. 



The Srarly foms of the post-tests were given to several teachers for 
adsinistratlon to classes of students currently taking electronics or auto 
mechanics courses. Such data underwit iteos analysis procedures which 
resulted in the improvement of many test it^s. 

l^en xesidy tox the first field trial both the carburetion and elec- . 
tronies unit consisted solely of objectives measurable by paper and* pen- 
cil tes\s. The instructional tliae alloted to each was t«i hours. Currently 
the carburetion materials (with 49 objectives) consists of 22 pages ^diile 
the electronics unit { with 23 objectives) consists of 41 pages. Both units 
were considered incomplete, for it is planned during the next year to add 
objectives demanding performance on actual carburetors and electronic cir- 
cuit boards which will require the instructional period to be lengthenmi 
frena 10 to 15 hours. 

For each unit a pre- and post- test were pr^ared. The post-test for 
electronics consists of 52 multiple choice items and the post-test for 
carburetion consists of 97 multiple choice it^s. Both of these tests 
were drawn specifically from the objectives. It^s for the pre-tests v^qxo 
randsmly selected frem the post-test iteas in order to provide measures 
which cduld toe completed in approximately 20 minutes. The pre-test for 
electronics contained 17 Itesas while the carburetion pre-test had 20 



5 






Igaitiai Tryout o Both psr^osa&s’tics tests were given their first test 
during; January and Pebruarj?, 1966=, The electronics was tried out in 

eight electronics classes at Los Angeles Trade Technical College.^ i*h@ 
carburetiow perfonsance test was txi^3d out in t\m classes, one at the 
junior college level at Los Angeles Trade Tecliaicai College, and one at 
Fullerton Union High School,® Tbirty-sisi students took part in the car 
buretion tryout aad 108 stud^ts in the elsctr»onie£v tryout. 



Bach instructor ^as giv©J a copy of the unit objectives and the re- 
source aaterials approsd-iaateiy one week prior to the tiae he ^as to teach 
the unit. Bach was instructed to attempt to accospiish the objectives 
stated in the unit, but to use any instructional tecisniques he wished. For 
purjxjsed of this trial, participating instructors wesre also asked to ?aake 
si^ggestions regarding ways in which the materials cjould be i^roved. 



Arrangements were ®ade with each teacher so that a ra^jber of the 
research staff administered (I) a twenty minute pre-test during the first 
day of the t^ hcurs devoted to the unit and (2) a firty Eiinute post-test 
at its conclusion. In addition, a questi'om\iair@ was adsninistered to the 
students at the close of the unit. A questionnaire wa\s also given to the 
teacher at that tiiae soliciting feis suggestions regarding the unit. Finally 
in two of the electronics classes and cue carburetion class the \^^nder lie 
Personnel Test, a 12 miuute test of ”probl©s solving ability'* was adminis- 
tered to the studeaots at the tiss© of the pre-test. In ail, 25 different 
variables were represented by the two questionnaires and the vgonderlic . 

The trials were ccispieted foetW4gea the dates of January 17 and February 10, 
1966. 

Analysis. Two different types of analysis were ^con^cted on the pre- 
liminary data. The first was to compute itecs analyses and coefficients of 
internar consistency (Kitder Bichardson Formula 20) on the pre- and post- 
tests. The second i^^as to coo^dte intercorrelations among the several mea- 
sures. K&y interest in the latter analysis focused on the possibility of 
detecting variables which could be used, in part, to oontrol for differ- 
ences among the pupils due to such factors as **s@t'* tov^ard the unit's ma- 
terial, intelligence, etc. Further, of course, the responses of the in- 
structor were carefully considered. The overall purpose of the initial 
analysis was essentially heuristic. were att&uptxag to find possible 
variables to be Considered in subs^zent trials of the materials. 



It was fully eacpeeted that a great number of defici&icies would exist 
in the first experimental versions of both the perfomance tests. Proce- 
dural deficiencies regarding such details as tests administration, rela- 
tions wi^h instructors, etc. , were also anticipated. 



^Appreciation is expressed to the administration and staff of Los Angeles 
Trade Technical College for their participation in this investigation. 

®/^preciation is also ©stressed to the administration and staff of 
Fullerton Union High School for their participation in this investigationo 



Re&ialtSo tlia uerfossauice students on tbe ai'nd past-tests is 

sumariged ia Table I* Item analysis resell ts repealed a caasiderable 
mmhez of itesssj pa 2 :tic\ilarly in tbv^ electronics tests, v^hlels vier© ia 
need of jrevfisiosi. The KH coefficients were markedly high^T for the 
carbuxetioss tests tba^a fof^the electronics tests » 



Table I 

Electronics and Carburetion atsd Post-Tests Results 

•TT ir*t]*r~y*-j rr ji ~ i ii-f ■ r T~-r^ r~^ n~ I I I i w n i i i nf » »iTfrn >'>infi>TiiT n-j — inirrT r rii i T ^i im — ' — ‘t n “~ i -T f — trnr 



Electronics 



CarburetioEs 

ta.'T'jcnf I ." ir « i wMg »aa iM- wa xa« — o« n K 





Pre-Test 


Post-Test 


Pre-Test 


Post-Test 


Number of Papils 108 


9S 


36 


33 


Humber of It<sas 


17 


S2 


20 


97 


Mean 


9.6S 


23,68 


10.41 


51.64 


Standard Deviation 


2o71 


9.37 


3o48 


17.86 




.,56 


.44 


.71 


.78 






Of the 23 =^ari^les oaastitwtisag the pnpiX and teacher quest ionnai res 
and the Woaderlie test^ interest centered on those which might be of value 
in adjusting for initial differe?ices aiEossg pupils and/or teachers* 
ever, some of the questionsaaire itesas were iJot constructed for this pur- 
pose. For instanee;^ teachers were a^*ed to list the of instruc- 

tional hours th^ actually used daring the unit. VJhile one might be 
interested in the possible correlation between instructional time in- 
vested and pupil achievem^t, there is no need to control for such 
instructional variables. On the other hi^nd, if related to post-test 
scores, pupil entry b^avior variables, such as grade point average, as 
well as tocher variables, such as sttitude toward the unit’s objectives, 
might permit statistical covariance adjustmesnts in analysing data. 

In the case of carburetion, the variables which v7ore most strongly 
related ,to post-test performance were the following (all positive rela- 
tionships): (1) pi^ils* overall grade point averages, r=.68; (2 )» pupils’' 

estimate of the pre-testa’s difficulty, i«e. , the stud^ts who thought it 
easier tending to score higher on the post-test, ( 3 ) pupils* 

expressed interest in the general field of auto mechanics, i.e., the 
students re^>onding with less interest tending to score higher swa the 
post-testy rs.59; (4) pupils* ©stressed interest in car?3ftiretion prior 
to beginning the unit., i.e., lower interest associated j?ith high post- 
test perfomance, r=j,57; and (5) pre-test socres, r='*56. The correla- 
tion between Wanderlic socres and post-test scores for only one class 
of 20 pi^ils was 

For electronics, '^© variables most highly related to post-test 
perforiBance were: (I) pupils* astisaate of the pre-tastSs difficulty, 
i«e*, students who theu^t it easier tending to scare higher on the 
post-test, r=r*S6j (2) pupils® overall grad© point average, rst«49 



(3) pupils-^ ©stressed i?3te^est in general f:leld 



of eleetr-DnicSj, 

iaS. 5 stwdeJBts l®ss in1’.ex'<ast: ts 3 ?dS.R 9 scoxe l 32 .^ber ©s» 

the post. -tests 5r=.475 aad ( 4 ) pupils’ ^sressed interest in electt^Kics 
tro^blesbootisj^ priosr to h^sginniug i:h& unity lesser iRtefre.st 

• . • ^ « • • _ e ^ ^ ^n«a 

assoc 3 .azea wzicn ^319x2^3: 

ejdsted bet'ween p»st«t' 2 Gt scores and pre-test perfomance (r ~ .05) as 
well as WoKderiic perfoziaaac® (r = ^05),, 



m© chief suggestioas frc^ partiGipatiKig instructors conceraed 
the addition and deletion of certain objectives for the nnitSo A num- 
ber of criticises were s&ade of the techRical teminologs*' ^sploy-sd in the 
objectives and referessce materials, itonjr instructors thou^t that the 
topics could not b® adequately treated in tes. instructional bourse Many 
ainor deficiencies in the <5iiaiity of the reference materials v#ere also 
noted. 



Discussion. This first trial of th© two perfox^aace tests yeilded the 
anticipated results, namely 5; a host of defects in the procedures and 
materials e^lcyed. The internal consistency coefficients for the elec- 
tronics pre« end post-tests were particularly low and will warrant fac- 
tor analytic treatment to see if ( 1 ) we are indeed measnsihg two or 
more relatively distinct dlasnsions or^ as is more likely 5 ( 2 ) the 
test has too many poor i tests. 



The stresigth of the relationships between the post- test aaeasttres 
and sesae of the pupil variables whic^n might be used for control pur- 
poses was rather encouraging. Several of the relativOissbips were rather 
pexplesdngs however* including the tendency for those esqpressing less 
interest in the general fioicl and specific unit to score higher 

on the post-test. Cenaaon sense taiglit sugvgest that the opposite would 
be true. The Wonderlic Personnel Test, apart fro© its ease of adodn- 
istration, seemed to offer little promise in this stu<^ as a predictor 
of post-test performance. Xn the case of carfeuretion, the pre-test 
score appeared to be a mudh store effective pr edict er. Hop^uXly^ 
further analysis of these data and subae«^ent field trials will reveal 
a number of measure^^ nhicb can be* used to reduce variation among 
different classes of pi^ils. 

Even though this initial set of data must be scrutinised with auefc 
more care, it is apparent even now that a great deal more work must be 
©3(p<caided in ia^roving the materials and measures imfolv<sd in these 
perfomance tests of tea<^ing effectiveness. 



